An improved Similarity Measure For Chinese Text Clustering
نویسندگان
چکیده
منابع مشابه
Improved Similarity Measure For Text Classification And Clustering
Computing the similarity between documents is an important operation in the text processing. In this paper, a new similarity measure is proposed. To calculate the similarity between two documents with respect to a feature, the proposed measure takes the following three cases in to account I) The same feature appears in both documents, II) The same feature appears in only one document, and III) ...
متن کاملStriving for an Improved Audio Similarity Measure
In this submission to MIREX’07, we implement various modifications to the Algorithm G1C by Elias Pampalk which ranked first in last year’s MIREX AudioSim task. Although each of the modifications showed only minor effects in our experiments, their combination constantly outperformed the original algorithm in our automated tests. Therefore, we consider it worth submitting the resulting algorithm ...
متن کاملText Clustering Using a Suffix Tree Similarity Measure
In text mining area, popular methods use the bagof-words models, which represent a document as a vector. These methods ignored the word sequence information, and the good clustering result limited to some special domains. This paper proposes a new similarity measure based on suffix tree model of text documents. It analyzes the word sequence information, and then computes the similarity between ...
متن کاملAn Improved Algorithm for Text Document Clustering
Due to the advancement of internet, the volume of the electronic documents available on the web is increasing day by day. Document clustering plays important role in organization and summarization of these documents. Thus, developing a fast and effective document clustering algorithm is of great importance. This paper presents an improved algorithm for document clustering. This algorithm is an ...
متن کاملAn improved semantic similarity measure for document clustering based on topic maps
A major computational burden, while performing document clustering, is the calculation of similarity measure between a pair of documents. Similarity measure is a function that assigns a real number between 0 and 1 to a pair of documents, depending upon the degree of similarity between them. A value of zero means that the documents are completely dissimilar whereas a value of one indicates that ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: DEStech Transactions on Engineering and Technology Research
سال: 2016
ISSN: 2475-885X
DOI: 10.12783/dtetr/icmite20162016/4588